Combinatorial Pattern Matching

chapter

Two-Dimensional Pattern Matching with Combined Scaling and Rotation

Christian Hundt, Maciej Liśkiewicz

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 5-17

The problem of two-dimensional pattern matching invariant under a given class of admissible transformations $\mathcal{F}$ is to find in text T matches of transformed versions f(P) of the pattern P, for all f in $\mathcal{F}$ . In this paper, pattern matching invariant under compositions of real scaling and rotation are investigated. We give a new discretization technique for this class of transformations...

chapter

Searching for Gapped Palindromes

Roman Kolpakov, Gregory Kucherov

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 18-30

We study the problem of finding, in a given word, all maximal gapped palindromes verifying two types of constraints, that we call long-armed and length-constrained palindromes. For both classes, we propose algorithms that run in time O(n + S), where S is the number of output palindromes. Both algorithms can be extended to compute biological gapped palindromes within the same time bound.

chapter

Parameterized Algorithms and Hardness Results for Some Graph Motif Problems

Nadja Betzler, Michael R. Fellows, Christian Komusiewicz, Rolf Niedermeier

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 31-43

We study the NP-complete Graph Motif problem: given a vertex-colored graph G = (V,E) and a multiset M of colors, does there exist an S ⊆ V such that G[S] is connected and carries exactly (also with respect to multiplicity) the colors in M? We present an improved randomized algorithm for Graph Motif with running time O(4.32^|M|·|M|²·|E|). We extend our algorithm to list-colored graph vertices and the...

chapter

Finding Largest Well-Predicted Subset of Protein Structure Models

Shuai Cheng Li, Dongbo Bu, Jinbo Xu, Ming Li

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 44-55

How to evaluate the quality of models is a basic problem for the field of protein structure prediction. Numerous evaluation criteria have been proposed, and one of the most intuitive criteria requires us to find a largest well-predicted subset — a maximum subset of the model which matches the native structure [12]. The problem is solvable in O(n ⁷) time, albeit too slow for practical usage...

chapter

HP Distance Via Double Cut and Join Distance

Anne Bergeron, Julia Mixtacki, Jens Stoye

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 56-68

The genomic distance problem in the Hannenhalli-Pevzner theory is the following: Given two genomes whose chromosomes are linear, calculate the minimum number of inversions and translocations that transform one genome into the other. This paper presents a new distance formula based on a simple tree structure that captures all the delicate features of this problem in a unifying way.

chapter

Fixed Parameter Tractable Alignment of RNA Structures Including Arbitrary Pseudoknots

Mathias Möhl, Sebastian Will, Rolf Backofen

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 69-81

We present an algorithm for computing the edit distance of two RNA structures with arbitrary kinds of pseudoknots. A main benefit of the algorithm is that, despite the problem is NP-hard, the algorithmic complexity adapts to the complexity of the RNA structures. Due to fixed parameter tractability, we can guarantee polynomial run-time for a parameter which is small in practice. Our algorithm can be...

chapter

Faster Algorithm for the Set Variant of the String Barcoding Problem

Leszek Gąsieniec, Cindy Y. Li, Meng Zhang

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 82-94

A string barcoding problem is defined as to find a minimum set of substrings that distinguish between all strings in a given set of strings ${\cal S}$ . In a biological sense the given strings represent a set of genomic sequences and the substrings serve as probes in a hybridisation experiment. In this paper, we study a variant of the string barcoding problem in which the substrings have to be chosen...

chapter

Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics

Tobias Marschall, Sven Rahmann

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 95-106

We present probabilistic arithmetic automata (PAAs), which can be used to model chains of operations whose operands depend on chance. We provide two different algorithms to exactly calculate the distribution of the results obtained by such probabilistic calculations. Although we introduce PAAs and the corresponding algorithm in a generic manner, our main concern is their application to pattern matching...

chapter

Analysis of the Size of Antidictionary in DCA

Julien Fayolle

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 107-117

We analyze the lossless data compression scheme using antidictionary. Its principle is to build the dictionary of a set of words that do not occur in the text (minimal forbidden words). We prove here that the number of words in the antidictionary, i.e., minimum forbidden words, behaves asymptotically linearly in the length of the text under a memoryless model on the generation of texts. The linearity...

chapter

Approximate String Matching with Address Bit Errors

Amihood Amir, Yonatan Aumann, Oren Kapah, Avivit Levy, more

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 118-129

A string S ∈ Σ ^m can be viewed as a set of pairs S = { (σ _i , i) : i ∈ { 0,..., m − 1} }. We consider approximate pattern matching problems arising from the setting where errors are introduced to the location component (i), rather than the more traditional setting, where errors are introduced to the content itself (σ ...

chapter

On-Line Approximate String Matching with Bounded Errors

Marcos Kiwi, Gonzalo Navarro, Claudio Telha

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 130-142

We introduce a new dimension to the widely studied on-line approximate string matching problem, by introducing an error threshold parameter ε so that the algorithm is allowed to miss occurrences with probability ε. This is particularly appropriate for this problem, as approximate searching is used to model many cases where exact answers are not mandatory. We show that the relaxed version of the problem...

chapter

A Black Box for Online Approximate Pattern Matching

Raphaël Clifford, Klim Efremenko, Benny Porat, Ely Porat

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 143-151

We present a deterministic black box solution for online approximate matching. Given a pattern of length m and a streaming text of length n that arrives one character at a time, the task is to report the distance between the pattern and a sliding window of the text as soon as the new character arrives. Our solution requires $O(\Sigma_{j=1}^{\log_2{m}} T(n,2^{j-1})/n)$ time for each input character,...

chapter

An(other) Entropy-Bounded Compressed Suffix Tree

Johannes Fischer, Veli Mäkinen, Gonzalo Navarro

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 152-165

Suffix trees are among the most important data structures in stringology, with myriads of applications. Their main problem is space usage, which has triggered much research striving for compressed representations that are still functional. We present a novel compressed suffix tree. Compared to the existing ones, ours is the first achieving at the same time sublogarithmic complexity for the operations,...

chapter

On Compact Representations of All-Pairs-Shortest-Path-Distance Matrices

Igor Nitto, Rossano Venturini

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 166-177

Let G be an unweighted and undirected graph of n nodes, and let D be the n ×n matrix storing the All-Pairs-Shortest-Path distances in G. Since D contains integers in [n] ∪ + ∞, its plain storage takes n ²log(n + 1) bits. However, a simple counting argument shows that (n ² − n)/2 bits are necessary to store D. In this paper we investigate the question of finding a succinct representation...

chapter

Computing Inverse ST in Linear Complexity

Ge Nong, Sen Zhang, Wai Hong Chan

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 178-190

The Sort Transform (ST) can significantly speed up the block sorting phase of the Burrows-Wheeler transform (BWT) by sorting only limited order contexts. However, the best result obtained so far for the inverse ST has a time complexity O(Nlogk) and a space complexity O(N), where N and k are the text size and the context order of the transform, respectively. In this paper, we present a novel algorithm...

chapter

Dynamic Fully-Compressed Suffix Trees

Luís M. S. Russo, Gonzalo Navarro, Arlindo L. Oliveira

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 191-203

Suffix trees are by far the most important data structure in stringology, with myriads of applications in fields like bioinformatics, data compression and information retrieval. Classical representations of suffix trees require O(n logn) bits of space, for a string of size n. This is considerably more than the n log₂ σ bits needed for the string itself, where σ is the alphabet size. The...

chapter

A Linear Delay Algorithm for Building Concept Lattices

Martin Farach-Colton, Yang Huang

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 204-216

Concept lattices (also called Galois lattices) have been applied in numerous areas, and several algorithms have been proposed to construct them. Generally, the input for lattice construction algorithms is a binary matrix with size |G||M| representing binary relation I ⊆ G ×M . In this paper, we consider polynomial delay algorithms for building concept lattices. Although the concept lattice may be...

chapter

Matching Integer Intervals by Minimal Sets of Binary Words with don’t cares

Wojciech Fraczak, Wojciech Rytter, Mohammadreza Yazdani

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 217-229

An interval [p,q], where 0 ≤ p ≤ q < 2ⁿ, can be considered as the set X of n-bit binary strings corresponding to encodings of all integers in [p,q]. A word w with don’t care symbols is matching the set L(w) of all words of the length |w| which can differ only on positions containing a don’t care. A set Y of words with don’t cares is matching X iff X = ∪ ...

chapter

Fast Algorithms for Computing Tree LCS

Shay Mozes, Dekel Tsur, Oren Weimann, Michal Ziv-Ukelson

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 230-243

The LCS of two rooted, ordered, and labeled trees F and G is the largest forest that can be obtained from both trees by deleting nodes. We present algorithms for computing tree LCS which exploit the sparsity inherent to the tree LCS problem. Assuming G is smaller than F, our first algorithm runs in time $O(r\cdot {\rm height}(F) \cdot {\rm height}(G)\cdot \lg\lg |G|)$ , where r is the number of...

chapter

Why Greed Works for Shortest Common Superstring Problem

Bin Ma

Lecture Notes in Computer Science > Combinatorial Pattern Matching > Contributed Papers > 244-254

The shortest common superstring problem (SCS) has been widely studied for its applications in string compression and DNA sequence assembly. Although it is known to be Max-SNP hard, the simple greedy algorithm works extremely well in practice. Previous researchers have proved that the greedy algorithm is asymptotically optimal on random instances. Unfortunately, the practical instances in DNA sequence...

INFONA - science communication portal

Combinatorial Pattern Matching
19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008 Proceedings

Two-Dimensional Pattern Matching with Combined Scaling and Rotation

Searching for Gapped Palindromes

Parameterized Algorithms and Hardness Results for Some Graph Motif Problems

Finding Largest Well-Predicted Subset of Protein Structure Models

HP Distance Via Double Cut and Join Distance

Fixed Parameter Tractable Alignment of RNA Structures Including Arbitrary Pseudoknots

Faster Algorithm for the Set Variant of the String Barcoding Problem

Probabilistic Arithmetic Automata and Their Application to Pattern Matching Statistics

Analysis of the Size of Antidictionary in DCA

Approximate String Matching with Address Bit Errors

On-Line Approximate String Matching with Bounded Errors

A Black Box for Online Approximate Pattern Matching

An(other) Entropy-Bounded Compressed Suffix Tree

On Compact Representations of All-Pairs-Shortest-Path-Distance Matrices

Computing Inverse ST in Linear Complexity

Dynamic Fully-Compressed Suffix Trees

A Linear Delay Algorithm for Building Concept Lattices

Matching Integer Intervals by Minimal Sets of Binary Words with don’t cares

Fast Algorithms for Computing Tree LCS

Why Greed Works for Shortest Common Superstring Problem

Filter options

Publication date

Keywords

INFONA - science communication portal

Combinatorial Pattern Matching 19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008 Proceedings $("#expandableTitles").expandable();

Add recipient

Sending message cancelled

Are you sure you want to cancel sending this message?

Send message

Filter options

Publication date

Date range setting

Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.

Keywords

Reporting an error / abuse

Sending the report failed

Accessibility options

Combinatorial Pattern Matching
19th Annual Symposium, CPM 2008, Pisa, Italy, June 18-20, 2008 Proceedings